Constrained Reinforcement Learning in Hard Exploration Problems
نویسندگان
چکیده
One approach to guaranteeing safety in Reinforcement Learning is through cost constraints that are dependent on the policy. Recent works constrained RL have developed methods ensure enforced even at learning time while maximizing overall value of Unfortunately, as demonstrated our experimental results, such approaches do not perform well complex multi-level tasks, with longer episode lengths or sparse rewards. To end, we propose a scalable hierarchical for problems employs backward functions context task hierarchy and novel intrinsic reward function lower levels enable constraint enforcement. key contributions proving theoretically viable when there multiple decision making. We also show new approach, referred Hierarchically Limited consTraint Enforcement (HiLiTE) significantly improves state art Constrained many benchmark from literature. further demonstrate this performance (on enforcement) clearly outperforms existing best RL.
منابع مشابه
Resource Constrained Exploration in Reinforcement Learning
This paper examines temporal difference reinforcement learning (RL) with adaptive and directed exploration for resource-limited missions. The scenario considered is for an energy-limited agent which must explore an unknown region to find new energy sources. The presented algorithm uses a Gaussian Process (GP) regression model to estimate the value function in an RL framework. However, to avoid ...
متن کاملLearning to soar: Resource-constrained exploration in reinforcement learning
This paper examines temporal difference reinforcement learning with adaptive and directed exploration for resourcelimited missions. The scenario considered is that of an unpowered aerial glider learning to perform energy-gaining flight trajectories in a thermal updraft. The presented algorithm, eGP-SARSA(l), uses a Gaussian process regression model to estimate the value function in a reinforcem...
متن کاملEfficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning
We introduce SCAL, an algorithm designed to perform efficient exploration-exploitation in any unknown weakly-communicating Markov Decision Process (MDP) for which an upper bound c on the span of the optimal bias function is known. For an MDP with S states, A actions and Γ ≤ S possible next states, we prove a regret bound of Õ(c √ ΓSAT ), which significantly improves over existing algorithms (e....
متن کاملEfficient Exploration in Reinforcement Learning
An agent acting in a world makes observations, takes actions, and receives rewards for the actions taken. Given a history of such interactions, the agent must make the next choice of action so as to maximize the long term sum of rewards. To do this well, an agent may take suboptimal actions which allow it to gather the information necessary to later take optimal or near-optimal actions with res...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i12.26757